Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs

نویسندگان

  • Mahdie Karbasi
  • Ahmed Hussen Abdelaziz
  • Hendrik Meutzner
  • Dorothea Kolossa
چکیده

Automatic prediction of speech intelligibility is highly desirable in the speech research community, since listening tests are timeconsuming and can not be used online. Most of the available objective speech intelligibility measures are intrusive methods, as they require a clean reference signal in addition to the corresponding noisy/processed signal at hand. In order to overcome the problem of predicting the speech intelligibility in the absence of the clean reference signal, we have proposed in [1] to employ a recognition/synthesis framework called twin hidden Markov model (THMM) for synthesizing the clean features, required inside an intrusive intelligibility prediction method. The new framework can predict the speech intelligibility equally well as well-known intrusive methods like the short-time objective intelligibility (STOI). The original THMM, however, requires the correct transcription for synthesizing the clean reference features, which is not always available. In this paper, we go one step further and investigate the use of the recognized transcription instead of the oracle transcription for obtaining a more widely applicable speech intelligibility prediction. We show that the output of the newly-proposed blind approach is highly correlated with the human speech recognition results, collected via crowdsourcing in different noise conditions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Noise Influence to Speech Intelligibility Non-Intrusively by Reduced Speech Dynamic Range

The noise influence to speech signal waveform can be characterized by reduced speech dynamic range (rDR). This motivated the present work to propose an rDR-based intelligibility measure (denoted as rDRm) that could be used to non-intrusively (i.e., do not require clean reference speech signal) predict speech intelligibility in noise and is computed only using the dynamic range extracted from th...

متن کامل

Random forest-based prediction of parkinson's disease progression using acoustic, ASR and intelligibility features

The Interspeech ComParE 2015 PC Sub-Challenge consists of automatically determining the degree of Parkinson’s condition using exclusively the patient’s voice. In this paper, we face this problem as a regression task and in order to succeed, we propose the use of an ensemble learning method, Random Forest (RF), in combination with features of different nature: acoustic characteristics, features ...

متن کامل

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR condition...

متن کامل

Predicting the bilateral advantage in cochlear implantees using a non-intrusive speech intelligibility measure

A measure to predict speech intelligibility in unilateral and bilateral cochlear implant (CI) users is proposed that does not need a priori information (i.e. is non-intrusive), such as the room acoustics. Such measure, termed BiSIMCI , combines an equalization-cancellation stage together with a modulation frequency estimation stage. Simulated and actual subjective data from CI users were used t...

متن کامل

Using HMMs and ANNs for mapping acoustic to visual speech

In this paper we present two different methods for mapping auditory, telephone quality speech to visual parameter trajectories, specifying the movements of an animated synthetic face. In the first method, Hidden Markov Models (HMMs) where used to obtain phoneme strings and time labels. These where then transformed by rules into parameter trajectories for visual speech synthesis. In the second m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016